-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize model cache reusing #21492
Generalize model cache reusing #21492
Conversation
924fc2a
to
824da2b
Compare
824da2b
to
599496a
Compare
auto& ov_version = ov::get_openvino_version(); | ||
m_compiled_model_format["OV_VERSION"] = ov_version.buildNumber; | ||
for (const auto& device : m_device_map) { | ||
m_compiled_model_format[device.first] = device.second->get_info().driver_version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we need to take driver version specific to device we are compile_model on.
But I hope that a driver is the same for all devices and we can collect only driver of first device here (otherwise depending on actual number of devices on the system, we can have different m_compiled_model_format
)
@isanghao could you please confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We print the GPU info in one iGPU + dGPU machine:
for (const auto& device : m_device_map) {
auto driver_info = device.second->get_info();
std::cout << device.first << ": (" << driver_info.dev_name << ", " << driver_info.vendor_id << ", "
<< driver_info.device_id << ", " << driver_info.driver_version << ")" << std::endl;
}
0: (Intel(R) UHD Graphics 770, 32902, 42880, 23.17.26241.33)
1: (Intel(R) Arc(TM) A770 Graphics, 32902, 22176, 23.17.26241.33)
2: (Intel(R) Arc(TM) A770 Graphics, 32902, 22176, 23.17.26241.33)
It seems iGPU and dGPU have the same driver version, I'm not sure whether they can be different driver version in the same machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose driver version is defined by driver_info.driver_version
which is the same for all devices, while other values are per-device properties. I hope that device_id and vendor_id are driver independent.
CC @isanghao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riverlijunjie I suppose we need to store driver version for a single device, because depending on a number of devices on the system we will have different value for m_compiled_model_format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, will do it!
d073124
to
cae554a
Compare
"[GPU] compiled_model_format: Couldn't find device for GPU with id ", | ||
m_compiled_model_format_device_id); | ||
// Set specified device info for compiled_model_format | ||
model_format_map["DEVICE_ID"] = m_compiled_model_format_device_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to keep device_id
?
caching_properties
already contain device architecture, which identifies the device we compile for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any scenario that device_id changed with the same device if there are multiple GPUs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use driver_info.vendor_id + driver_info.device_id
to guarantee actual device not change?
That means if driver_info.vendor_id + driver_info.device_id
is same, the device should have the same hardware configuration (cache size, EU number)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any scenario that device_id changed with the same device if there are multiple GPUs?
yes, if we have multiple discrete GPUs and one of them is removed from that machine between consequent runs.
We can use driver_info.vendor_id + driver_info.device_id to guarantee actual device not change?
We don't need to guarantee it in compile_model_format
property. It's a responsibility of caching_properties
, which holds ov::device::architecture
, which does not depend on actual order of the devices.
caching_properties
defines a list of properties, which describe device and hardcoded compilation options, which affect final blob and its hashcompiled_model_format
holds the properties, which can later be changed in runtime (they are not hardcoded).
For example, we compiled out blob and later load it with:
- Different OV runtime version. For plugins like CPU, GPU it's sensitive that OV runtime should be the same, because there are no guarantees that the same blob can be used with multiple OV versions.
- Different driver version (read as execution runtime) for GPU, NPU devices. Drivers can be incompatible with compiled blobs compiled with other drivers (read compilers)
Maybe we need a better name for compiled_model_format
to denote a list of properties, which can be changed in future runs and lead to compiled blob removal (because system is updated and new blob should be triggered).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compiled_model_format
holds some runtime software dependencies for device model running, such as driver version, etc. While caching_properties
should hold hardware information, such as architecture, EU number, right?
But we can see that in gpu plugin, caching_properties
has contained ov::intel_gpu::driver_version
, so seems that driver_version is unnecessary in compiled_model_format
?
openvino/src/plugins/intel_gpu/src/plugin/plugin.cpp
Lines 516 to 526 in 17008c9
std::vector<ov::PropertyName> Plugin::get_caching_properties() const { | |
static const std::vector<ov::PropertyName> caching_properties = { | |
ov::PropertyName{ov::device::architecture.name(), PropertyMutability::RO}, | |
ov::PropertyName{ov::intel_gpu::execution_units_count.name(), PropertyMutability::RO}, | |
ov::PropertyName{ov::intel_gpu::driver_version.name(), PropertyMutability::RO}, | |
ov::PropertyName{ov::hint::inference_precision.name(), PropertyMutability::RW}, | |
ov::PropertyName{ov::hint::execution_mode.name(), PropertyMutability::RW}, | |
}; | |
return caching_properties; | |
} |
By the way, if we need a proper name for compiled_model_format
, I vote compiled_model_runtime_properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we can see that in gpu plugin, caching_properties has contained ov::intel_gpu::driver_version
I suppose we need to move it from caching_properties
to compiled_model_format
. Otherwise, if driver version will be changed, blob hash is changed as well => blob will always be located on the file system and will not be removed by OpenVINO.
If we move driver_version
to compiled_model_format
, then hash is the same, but OV detects that compiled_model_format
is changed => compiled blob must be dropped and regenerated.
compiled_model_format, I vote compiled_model_runtime_properties
I agree
// Set specified device info for compiled_model_format | ||
model_format_map["DEVICE_ID"] = m_compiled_model_format_device_id; | ||
model_format_map["DRIVER_VERSION"] = | ||
m_device_map.at(m_compiled_model_format_device_id)->get_info().driver_version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but as we saw, driver version does not depend on actual device, because when you install driver on the system, it works for all devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean we need bind to the actual device, not driver version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual device can be represented by driver_info.vendor_id + driver_info.device_id
?
So that we apply below 3 items:
- vendor_id
- device_id
- driver_version
Does it make sense? @ilya-lavrenov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, have a look at https://github.com/openvinotoolkit/openvino/pull/21492/files#r1426253560
IMO we need to cover only driver version here, because device architecture is already a part of caching_properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from functionality perspective. 👍
Looking forward to checking it out with NPU.
78f639f
to
b5f0a31
Compare
* | ||
* @ingroup ov_dev_api_plugin_api | ||
*/ | ||
static constexpr Property<std::string, PropertyMutability::RO> compiled_model_runtime_properties{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can contain a vector<string>
and actual conversion to string can be done on Core level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I know any benefits if we choose std::vector<std::string>
rather than std::string
?
From my understanding, core level doesn't need understand its contents and only read it from blob file's header or send it to plugin side, so std::string
is enough, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it will simplify plugin code, because currently they have to parse / convert from / to string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In plugin side, we adopt ov::AnyMap
and convert it to or from std::string
, seems it is not too complex?
Please fix ie_tests_win_cpu_vs2019_release test |
Details:
Tickets: